151 research outputs found

    Integrating protein-protein interactions and text mining for protein function prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Functional annotation of proteins remains a challenging task. Currently the scientific literature serves as the main source for yet uncurated functional annotations, but curation work is slow and expensive. Automatic techniques that support this work are still lacking reliability. We developed a method to identify conserved protein interaction graphs and to predict missing protein functions from orthologs in these graphs. To enhance the precision of the results, we furthermore implemented a procedure that validates all predictions based on findings reported in the literature.</p> <p>Results</p> <p>Using this procedure, more than 80% of the GO annotations for proteins with highly conserved orthologs that are available in UniProtKb/Swiss-Prot could be verified automatically. For a subset of proteins we predicted new GO annotations that were not available in UniProtKb/Swiss-Prot. All predictions were correct (100% precision) according to the verifications from a trained curator.</p> <p>Conclusion</p> <p>Our method of integrating CCSs and literature mining is thus a highly reliable approach to predict GO annotations for weakly characterized proteins with orthologs.</p

    A rare benign disorder mimicking metastasis on radiographic examination: a case report of osteopoikilosis

    Get PDF
    Osteopoikilosis (OPK) is a rare, autosomal dominant bone disorder, characterized by multiple, discrete round or ovoid radio densities scattered throughout the axial and appendicular skeleton. OPK is usually asymptomatic but rarely there may be slight articular pain and joint effusions. OPK is generally diagnosed incidentally on radiographic examinations and may mimic different bone pathologies, including bone metastases. Radionuclide bone scan has a critical role in distinguishing OPK from osteoblastic bone metastases. In this case report, we present a young man with right hip pain due to OPK, whose plain radiogram and computerized tomography findings thought cancer metastasis

    Metrics for GO based protein semantic similarity: a systematic evaluation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.</p> <p>Results</p> <p>We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.</p> <p>Conclusions</p> <p>This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p

    BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction

    Full text link
    Successful biomedical relation extraction can provide evidence to researchers and clinicians about possible unknown associations between biomedical entities, advancing the current knowledge we have about those entities and their inherent mechanisms. Most biomedical relation extraction systems do not resort to external sources of knowledge, such as domain-specific ontologies. However, using deep learning methods, along with biomedical ontologies, has been recently shown to effectively advance the biomedical relation extraction field. To perform relation extraction, our deep learning system, BiOnt, employs four types of biomedical ontologies, namely, the Gene Ontology, the Human Phenotype Ontology, the Human Disease Ontology, and the Chemical Entities of Biological Interest, regarding gene-products, phenotypes, diseases, and chemical compounds, respectively. We tested our system with three data sets that represent three different types of relations of biomedical entities. BiOnt achieved, in F-score, an improvement of 4.93 percentage points for drug-drug interactions (DDI corpus), 4.99 percentage points for phenotype-gene relations (PGR corpus), and 2.21 percentage points for chemical-induced disease relations (BC5CDR corpus), relatively to the state-of-the-art. The code supporting this system is available at https://github.com/lasigeBioTM/BiOnt.Comment: ECIR 202

    Identification of disease-causing genes using microarray data mining and gene ontology

    Get PDF
    Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

    The Antioxidant Potential of the Mediterranean Diet in Patients at High Cardiovascular Risk: An In-Depth Review of the PREDIMED

    Get PDF
    Cardiovascular disease (CVD) is the leading global cause of death. Diet is known to be important in the prevention of CVD. The PREDIMED trial tested a relatively low-fat diet versus a high-fat Mediterranean diet (MedDiet) for the primary prevention of CVD. The resulting reduction of the CV composite outcome resulted in a paradigm shift in CV nutrition. Though many dietary factors likely contributed to this effect, this review focuses on the influence of the MedDiet on endogenous antioxidant systems and the effect of dietary polyphenols. Subgroup analysis of the PREDIMED trial revealed increased endogenous antioxidant and decreased pro-oxidant activity in the MedDiet groups. Moreover, higher polyphenol intake was associated with lower incidence of the primary outcome, overall mortality, blood pressure, inflammatory biomarkers, onset of new-onset type 2 diabetes mellitus (T2DM), and obesity. This suggests that polyphenols likely contributed to the lower incidence of the primary event in the MedDiet groups. In this article, we summarize the potential benefits of polyphenols found in the MedDiet, specifically the PREDIMED cohort. We also discuss the need for further research to confirm and expand the findings of the PREDIMED in a non-Mediterranean population and to determine the exact mechanisms of action of polyphenols

    Learning pair-wise gene functional similarity by multiplex gene expression maps

    Get PDF
    Abstract Background The relationships between the gene functional similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute. Results Here, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the functional similarities are increased too. The model predicts the functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction. Conclusions By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions

    Differential expression of follistatin and FLRG in human breast proliferative disorders

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Activins are growth factors acting on cell growth and differentiation. Activins are expressed in high grade breast tumors and they display an antiproliferative effect inducing G0/G1 cell cycle arrest in breast cancer cell lines. Follistatin and follistatin- related gene (FLRG) bind and neutralize activins. In order to establish if these activin binding proteins are involved in breast tumor progression, the present study evaluated follistatin and FLRG pattern of mRNA and protein expression in normal human breast tissue and in different breast proliferative diseases.</p> <p>Methods</p> <p>Paraffin embedded specimens of normal breast (NB - n = 8); florid hyperplasia without atypia (FH - n = 17); fibroadenoma (FIB - n = 17); ductal carcinoma <it>in situ </it>(DCIS - n = 10) and infiltrating ductal carcinoma (IDC - n = 15) were processed for follistatin and FLRG immunohistochemistry and <it>in situ </it>hybridization. The area and intensity of chromogen epithelial and stromal staining were analyzed semi-quantitatively.</p> <p>Results</p> <p>Follistatin and FLRG were expressed both in normal tissue and in all the breast diseases investigated. Follistatin staining was detected in the epithelial cytoplasm and nucleus in normal, benign and malignant breast tissue, with a stronger staining intensity in the peri-alveolar stromal cells of FIB at both mRNA and protein levels. Conversely, FLRG area and intensity of mRNA and protein staining were higher both in the cytoplasm and in the nucleus of IDC epithelial cells when compared to NB, while no significant changes in the stromal intensity were observed in all the proliferative diseases analyzed.</p> <p>Conclusion</p> <p>The present findings suggest a role for follistatin in breast benign disease, particularly in FIB, where its expression was increased in stromal cells. The up regulation of FLRG in IDC suggests a role for this protein in the progression of breast malignancy. As activin displays an anti-proliferative effect in human mammary cells, the present findings indicate that an increased FST and FLRG expression in breast proliferative diseases might counteract the anti-proliferative effects of activin in human breast cancer.</p

    An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity.</p> <p>Results</p> <p>We describe an improved algorithm, Topological Clustering Semantic Similarity (TCSS), to compute semantic similarity between GO terms annotated to proteins in interaction datasets. Our algorithm, considers unequal depth of biological knowledge representation in different branches of the GO graph. The central idea is to divide the GO graph into sub-graphs and score PPIs higher if participating proteins belong to the same sub-graph as compared to if they belong to different sub-graphs.</p> <p>Conclusions</p> <p>The TCSS algorithm performs better than other semantic similarity measurement techniques that we evaluated in terms of their performance on distinguishing true from false protein interactions, and correlation with gene expression and protein families. We show an average improvement of 4.6 times the <it>F</it><sub>1 </sub>score over Resnik, the next best method, on our <it>Saccharomyces cerevisiae </it>PPI dataset and 2 times on our <it>Homo sapiens </it>PPI dataset using cellular component, biological process and molecular function GO annotations.</p
    corecore